Skip to content

fix: prevent duplicate artifact sends across turns in WeCom channel#2728

Open
highland0971 wants to merge 1 commit intobytedance:mainfrom
highland0971:pr/fix-artifact-duplicate-sending
Open

fix: prevent duplicate artifact sends across turns in WeCom channel#2728
highland0971 wants to merge 1 commit intobytedance:mainfrom
highland0971:pr/fix-artifact-duplicate-sending

Conversation

@highland0971
Copy link
Copy Markdown

PR 2: Fix Duplicate Artifact File Sending Across Turns

Problem

When using the present_files tool to generate file artifacts, the system
sends all historically generated files in the conversation on every new
turn, rather than only newly-produced artifacts.

Additionally, if DeerFlow or its Gateway restarts (new thread_id assigned
to the same IM conversation), all previously sent files are re-sent again
because the sent-artifact tracking was lost.

Root Cause

The original _extract_artifacts function was designed to scan the message
history and extract file paths from present_files tool calls after the
last human message
only. However:

  1. The original implementation had a subtle bug: it iterated messages in
    reverse and stopped at the first type == "human" message, but
    present_files tool calls in subsequent turns could be missed if
    the reverse scan logic was incorrect.
  2. There was no persistence of already-sent artifacts — every turn
    re-extracted artifacts from the message history, which contained all
    past tool calls.
  3. After a restart, the ChannelStore's thread_id mapping changed but
    there was no cross-restart tracking of sent artifacts.

Fix

store.py — New sent-artifact persistence API

  • Added get_sent_artifacts(channel_name, chat_id, topic_id=None) and
    set_sent_artifacts(channel_name, chat_id, paths, topic_id=None) methods.
  • Keys are stored as sent_artifacts:<channel>:<chat_id>[:<topic_id>] in
    the existing JSON-backed ChannelStore, ensuring survival across server
    restarts
    .
  • Keyed by IM conversation identity (channel + chat), not by internal
    thread_id, so the record persists even when Gateway restarts assign a
    new internal thread.

manager.py — New artifact diff logic

  • Replaced the reverse-scan approach with a state-based extraction from
    the artifacts state key (already normalized by present_file_tool).
  • Added _diff_artifacts() function that compares extracted artifact paths
    against the store's sent_artifacts record for that conversation, returning
    only newly-produced artifacts.
  • Updated both _handle_chat and _handle_streaming_chat to pass
    channel_name, chat_id, topic_id, and store to the new diff logic.

Usage Flow

  1. On first turn with artifacts: all artifacts are sent, recorded in store.
  2. On subsequent turns: only artifacts NOT in the store are sent.
  3. After restart: new thread_id, but sent_artifacts lookup uses
    channel:chat_id → previous records found → no duplicates sent.

Testing

  • Production-verified with 4 historical artifacts, only 2 new artifacts
    sent on subsequent turns.
  • Verified cross-restart persistence: after service restart, sent-artifact
    records were correctly loaded from store.json.

Files Changed

  • backend/app/channels/store.py
  • backend/app/channels/manager.py

# PR 2: Fix Duplicate Artifact File Sending Across Turns

## Problem

When using the `present_files` tool to generate file artifacts, the system 
sends **all historically generated files** in the conversation on every new 
turn, rather than only newly-produced artifacts.

Additionally, if DeerFlow or its Gateway restarts (new `thread_id` assigned 
to the same IM conversation), **all previously sent files are re-sent** again 
because the sent-artifact tracking was lost.

## Root Cause

The original `_extract_artifacts` function was designed to scan the message 
history and extract file paths from `present_files` tool calls **after the 
last human message** only. However:

1. The original implementation had a subtle bug: it iterated messages in 
   **reverse** and stopped at the first `type == "human"` message, but 
   `present_files` tool calls in **subsequent turns** could be missed if 
   the reverse scan logic was incorrect.
2. There was **no persistence** of already-sent artifacts — every turn 
   re-extracted artifacts from the message history, which contained all 
   past tool calls.
3. After a restart, the `ChannelStore`'s `thread_id` mapping changed but 
   there was no cross-restart tracking of sent artifacts.

## Fix

### store.py — New sent-artifact persistence API
- Added `get_sent_artifacts(channel_name, chat_id, topic_id=None)` and 
  `set_sent_artifacts(channel_name, chat_id, paths, topic_id=None)` methods.
- Keys are stored as `sent_artifacts:<channel>:<chat_id>[:<topic_id>]` in 
  the existing JSON-backed `ChannelStore`, ensuring **survival across server 
  restarts**.
- Keyed by IM conversation identity (channel + chat), not by internal 
  `thread_id`, so the record persists even when Gateway restarts assign a 
  new internal thread.

### manager.py — New artifact diff logic
- Replaced the reverse-scan approach with a **state-based extraction** from 
  the `artifacts` state key (already normalized by `present_file_tool`).
- Added `_diff_artifacts()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant